Introduce expression/type formatting functor #2192

kroening · 2018-05-17T11:02:12Z

The interface comes with an exemplar of an application (show_symbol_table).
Note that this can replace util/format_expr.h and util/format_type.h.

This is extensible, i.e., it is possible to make the debug_formatter treat particular cases differently; it is possible to pass further data along with the formatter, say a symbol table, a wide space policy, etc., if needed. The formatter can be stateful (say for indents, parentheses, etc.).

tautschnig · 2018-05-17T11:21:05Z

So what about the language-specific output? Is it a sound assumption that all symbols in the symbol table will always belong to a single language?

martin-cs

The changes in "my" part look OK but I have a bigger question -- how is this going to work with expr2c expr2java, expr2whatever, etc. Without some link to that it seems like just extra wrapping.

kroening · 2018-05-18T10:45:54Z

An option is to build expr2c (or other languages) by extending this one; the obvious, low-effort step is to add a wrapper.

kroening · 2018-05-18T10:50:42Z

Language-specific output: The idea is that we use the language of the application: cbmc prints C syntax, jbmc prints java syntax, adabmc prints, ADA, etc. pp
One could contemplate what to do for mixed-language environments.

tautschnig · 2018-05-18T10:57:48Z

How about not defining a global, static symbol, and instead forcing each front-end to define it? Then at least the front-end has to make a conscious choice.
goto-diff and goto-instrument will need some extra work, because they should at times generate language-specific output but don't have a fixed source language.

kroening · 2018-05-18T11:33:58Z

Yes, the idea is that every frontend provides one of these. The one in formatter is really meant for debugging only, and it's hoped that the name makes this clear.
Yes, these do make the case for a formatter factory, similar in style to Peter's light-weight language API.

tautschnig · 2018-05-18T11:46:47Z

If the debug formatter weren't a static object then front-ends could also configure it to produce, e.g., json-compatible output. Thus I'd still like to see this static object go away.

kroening · 2018-05-18T12:02:00Z

debug formatter: but then pass down a formatter basically everywhere that might want to debug?

tautschnig · 2018-05-18T12:06:56Z

debug formatter: but then pass down a formatter basically everywhere that might want to debug?

No, you can safely leave the extern debug_formattert debug_formatter; in place and make front-ends actually create that object. Maybe that declaration should become a std::unique_ptr though. An alternative option is to make the debug_formattert configurable - leaving the statically allocated object in place, but allowing front-ends to change its behaviour.

peterschrammel · 2018-05-18T12:22:19Z

I think tools called goto-* should really operate on goto-binaries only and also only produce language-independent output.

kroening · 2018-05-18T12:27:04Z

The debug_formatter is now configurable.

tautschnig · 2018-05-18T12:27:18Z

I think tools called goto-* should really operate on goto-binaries only and also only produce language-independent output.

I agree with that in principle, but that requires any such language-independent output to be properly tested and effectively become as good as expr2ct is today. It really means that it is not a debug tool anymore. Users can reasonably expect that there are no regressions visible to them when all that happens is under-the-hood cleanup.

tautschnig

I'm sure this will eventually be valuable, but right now it still has a few issues:

It is introducing even more objects of static lifetime, which seemed something that was to be gotten rid of.
There is no sensitivity towards plain text/JSON/XML output, with no apparent way of making this possible.
Maybe one of the commits could clarify the "why?"

tautschnig · 2018-05-19T14:36:37Z

src/util/formatter.h

+    return formatter->format(os, o);
+  }
+
+  formattert *formatter;


A raw public pointer seems very dangerous. There should be an API around it - in particular as this is, as far as I can tell, how it is made "configurable"?!

Sure, will wrap.

tautschnig · 2018-05-19T14:38:02Z

src/util/formatter.h

+  std::ostream &format(std::ostream &, const source_locationt &) override;
+};
+
+extern default_debug_formattert default_debug_formatter;


This should not be a global object. Create it when creating the debug_formattert. Actually I'd move the entire declaration to the .cpp file.

The key benefit is that it can then be a default argument. I understand overloading is an alternative, but that also creates many functions.

kroening · 2018-05-19T20:14:12Z

Can you clarify what sensitivity towards XML/JSON output would mean?

peterschrammel · 2018-05-19T20:15:45Z

How will we supply language-specific formatters to be used in the structured JSON/XML output (!= messages)?

tautschnig · 2018-05-20T08:44:02Z

Can you clarify what sensitivity towards XML/JSON output would mean?

escaping needs to be done differently (though could be fixed up afterwards if need be).
some output should be structured: an example that immediately comes to mind are source_locationt.

tautschnig · 2018-05-21T09:54:04Z

A couple of thoughts as the overall design goals and plans aren't very clear to me. Just my opinion, I would very much welcome comments/feedback/push-back:

We should have an implementation that prints goto-program expressions in a what we designate as goto-program-expression syntax. First, what is a "goto-program expression?" That's any expression appearing in a goto-program, with its semantics defined by our back-ends. In many cases the semantics will be the same as those of similarly-named expressions in C or Java, but this needn't be the case. Also there are some expressions that only exist in goto-program world, and not necessarily in any of the front-end languages (one of those is "byte extract"). We may choose the use a syntax that is very similar to that of C or Java to represent goto-program expressions, and implicitly we have been doing so all along. Let's clean this up and make clear that there is a syntax for goto-program expressions. I would argue that format_expr is taking care of this, once completely done. Completeness should be defined by "for every expression supported by our back-end, there must be a well-defined syntax." Notably, any use of pretty() ought to be replaced by UNREACHABLE. Running --show-goto-functions (while making --show-goto-functions use format) on all regression tests might be a decent starting point to test this.
If the above is acceptable, then there is no need for a debug_formatter or the likes as we would use format_expr as the one and only acceptable form of output for all back-end parts.
Places where front-end-language specific output is to be used should be limited to a) output of front-end expressions (such as warnings or errors during parsing, type-checking or conversion), b) output of results that are to be translated back to the input language, such as counterexample traces or test suites, c) expressions in the symbol table that have not been converted (for example, the value has not necessarily been converted, and should thus be printed using language-specific syntax), and d) dump-c or any other means that very explicitly translates goto-program expressions back to the input-language world.
expr2ct (and others deriving from it) should be changed to create valid C (Java, etc.) expressions in all cases. One example that comes to mind convert_constant_bool which still prints the non-C constants TRUE and FALSE. This would be perfectly fine in goto-program world, but should not exist in the input-language front-end. Support for printing, e.g., "byte_extract" would no longer be provided.
The functors introduced here may be a good idea, but they lack the one bit of configurability that we actually need when producing output: awareness of the target context language, i.e., is it plain text, JSON, XML, or something else? We might want to have a distinct syntax for goto-programs (and goto-program expressions) in plain text/JSON/XML, in which case the point of creating the output needs to be aware. We may also choose not to have this, in which case all we need to watch out for is doing proper escaping before embedding strings in JSON/XML output. Note that for C/Java, there is (AFAIK) no JSON/XML syntax, and thus such a distinction is not necessary - only the escaping part needs taking care of in case we embed such output in JSON/XML.

peterschrammel · 2018-05-21T12:52:00Z

@tautschnig, fully agree.

If we want to be really radical, we could go even one step further and define any non-debugging output (I'm talking about goto-programs, traces, etc) in a (structured) way that is independent of plain/JSON/XML and then provide a generic converter from the structured output specification to the respective formats. There is a massive potential for reducing code duplication in the various show_* classes...

tautschnig · 2018-05-21T14:52:15Z

If we want to be really radical, we could go even one step further and define any non-debugging output (I'm talking about goto-programs, traces, etc) in a (structured) way that is independent of plain/JSON/XML [...]

Yes, I have thought along those lines as well, with the following result:

Most of the debate around format etc is about translating to a single (possibly large) string, i.e., does not collide with this idea.
source_locationt are a notable exception that we don't want to turn into just a single string.
I think we just need to distinguish string/key-value maps/arrays; mapping those to plain text incidentally is the least obvious/trivial part.
Since I don't really know what the current overall plan is I haven't actually made a stab at this. Maybe someone who knows about the plan can share it?

kroening · 2018-05-22T08:27:52Z

The 'radical' idea certainly has appeal; however, it's not trivial. As simplest example, JSON prefers camlCase, whereas XML uses a different convention. Plain text has further human preferences attached, which requires further special-casing.

I'd be tempted to remove that redundancy by killing XML eventually; fewer and fewer people care.

martin-cs · 2018-05-22T11:29:26Z

Although I'm far from a fan of XML, I think there were a number of commercial users for it last time I checked.

TGWDB · 2021-02-23T10:31:05Z

Closing this PR as the discussion implies that a different solution is desirable and this PR is not going to be completed/approved in current form. Unless the discussion/opinions change significantly, recommend a new PR for the new approach rather than reopen this one to continue this approach.

If you believe this has been closed erroneously please reopen.

kroening requested review from tautschnig and peterschrammel May 17, 2018 11:02

kroening requested review from chrisr-diffblue, martin-cs, romainbrenguier, smowton and thk123 as code owners May 17, 2018 11:02

kroening force-pushed the format-functor branch from 762388c to 8d6bef0 Compare May 17, 2018 11:17

martin-cs approved these changes May 17, 2018

View reviewed changes

kroening force-pushed the format-functor branch from 8d6bef0 to aac9fdd Compare May 18, 2018 11:53

kroening force-pushed the format-functor branch from aac9fdd to 82a5d95 Compare May 18, 2018 12:14

kroening requested review from cesaro and mgudemann as code owners May 18, 2018 12:14

Daniel Kroening added 2 commits May 18, 2018 13:26

functor for language-specific expression formatting

9cb2c2a

replace usage of format() by debug_formatter()

677ea94

kroening force-pushed the format-functor branch from 82a5d95 to 3fa28f0 Compare May 18, 2018 12:26

Daniel Kroening added 2 commits May 18, 2018 14:27

formatter for ANSI-C

fdf802a

formatter for Java

841d407

kroening force-pushed the format-functor branch from 3fa28f0 to 69ab0d1 Compare May 18, 2018 13:27

C++ formatter

1e7fb0a

kroening force-pushed the format-functor branch 2 times, most recently from db211d6 to d46e8e5 Compare May 18, 2018 16:05

use formatter in show_symbol_table

f93fb2c

kroening force-pushed the format-functor branch from d46e8e5 to f93fb2c Compare May 18, 2018 16:30

kroening assigned kroening and tautschnig and unassigned kroening May 19, 2018

tautschnig requested changes May 19, 2018

View reviewed changes

tautschnig assigned kroening and unassigned tautschnig May 19, 2018

peterschrammel mentioned this pull request Sep 6, 2018

Remove deprecated Java and string options [TG-4282] #2907

Merged

TGWDB closed this Feb 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce expression/type formatting functor #2192

Introduce expression/type formatting functor #2192

kroening commented May 17, 2018

tautschnig commented May 17, 2018

martin-cs left a comment

kroening commented May 18, 2018

kroening commented May 18, 2018

tautschnig commented May 18, 2018

kroening commented May 18, 2018

tautschnig commented May 18, 2018

kroening commented May 18, 2018

tautschnig commented May 18, 2018

peterschrammel commented May 18, 2018

kroening commented May 18, 2018

tautschnig commented May 18, 2018

tautschnig left a comment

tautschnig May 19, 2018

kroening May 19, 2018

tautschnig May 19, 2018

kroening May 19, 2018

kroening commented May 19, 2018

peterschrammel commented May 19, 2018 •

edited

Loading

tautschnig commented May 20, 2018

tautschnig commented May 21, 2018

peterschrammel commented May 21, 2018

tautschnig commented May 21, 2018

kroening commented May 22, 2018

martin-cs commented May 22, 2018

TGWDB commented Feb 23, 2021

Introduce expression/type formatting functor #2192

Introduce expression/type formatting functor #2192

Conversation

kroening commented May 17, 2018

tautschnig commented May 17, 2018

martin-cs left a comment

Choose a reason for hiding this comment

kroening commented May 18, 2018

kroening commented May 18, 2018

tautschnig commented May 18, 2018

kroening commented May 18, 2018

tautschnig commented May 18, 2018

kroening commented May 18, 2018

tautschnig commented May 18, 2018

peterschrammel commented May 18, 2018

kroening commented May 18, 2018

tautschnig commented May 18, 2018

tautschnig left a comment

Choose a reason for hiding this comment

tautschnig May 19, 2018

Choose a reason for hiding this comment

kroening May 19, 2018

Choose a reason for hiding this comment

tautschnig May 19, 2018

Choose a reason for hiding this comment

kroening May 19, 2018

Choose a reason for hiding this comment

kroening commented May 19, 2018

peterschrammel commented May 19, 2018 • edited Loading

tautschnig commented May 20, 2018

tautschnig commented May 21, 2018

peterschrammel commented May 21, 2018

tautschnig commented May 21, 2018

kroening commented May 22, 2018

martin-cs commented May 22, 2018

TGWDB commented Feb 23, 2021

peterschrammel commented May 19, 2018 •

edited

Loading